Scaling Marginalized Importance Sampling to High-Dimensional State-Spaces via State Abstraction
نویسندگان
چکیده
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where goal is to estimate performance an policy, pie, using a fixed dataset, D, collected by one or more policies that may be different from pie. Current OPE algorithms produce poor estimates under policy distribution shift i.e., when probability particular state-action pair occurring pie very same D. In this work, we propose improve accuracy estimators projecting high-dimensional state-space into low-dimensional concepts state abstraction literature. Specifically, marginalized importance sampling (MIS) which compute correction ratios their estimate. original ground state-space, these have high variance lead OPE. However, prove lower-dimensional abstract can lower resulting then highlight challenges arise estimating data, identify sufficient conditions overcome issues, and present minimax optimization whose solution yields ratios. Finally, our empirical on difficult, tasks shows make MIS achieve mean-squared error robust hyperparameter tuning than
منابع مشابه
Adaptive Importance Sampling for MarkovChains on General State Spaces 1
Adaptive importance sampling involves successively estimating the function of interest and then constructing an importance sampling scheme built on the estimate. Here, we investigate such a scheme used in simulations of Markov chains derived from particle transport problems. Previous work had shown that for nite state spaces the convergence was exponential, which veri ed computational experienc...
متن کاملState-dependent importance sampling schemes via minimum cross-entropy
We present a method to obtain stateand time-dependent importance sampling estimators by repeatedly solving a minimum cross-entropy (MCE) program as the simulation progresses. This MCE-based approach lends a foundation to the natural notion to stop changing the measure when it is no longer needed. We use this method to obtain a stateand time-dependent estimator for the one-tailed probability of ...
متن کاملEfficient High-Dimensional Importance Sampling
The paper describes a simple, generic and yet highly accurate Efficient Importance Sampling (EIS) Monte Carlo (MC) procedure for the evaluation of high-dimensional numerical integrals. EIS is based upon a sequence of auxiliary weighted regressions which actually are linear under appropriate conditions. It can be used to evaluate likelihood functions and byproducts thereof, such as ML estimators...
متن کاملNear Optimal Behavior via Approximate State Abstraction
The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present ...
متن کاملDeep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
While recent advances in deep reinforcement learning have allowed autonomous learning agents to succeed at a variety of complex tasks, existing algorithms generally require a lot of training data. One way to increase the speed at which agents are able to learn to perform tasks is by leveraging the input of human trainers. Although such input can take many forms, real-time, scalar-valued feedbac...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i8.26128